Hello CogSci’s :)

In this portfolio, you are asked to do four tasks:

- Make a DAG for something

- Simulate data that fits the DAG

- Use linear models to confirm that the DAG fits the data

- Mess it up.

Each of the four tasks have some sub-steps.
Report briefly what you find, for example in a markdown document, for example called report.md so that the poor TA can easily get an overview before looking in your code :)

Then you can also make a (brief!) explanation of the phenomenon you are DAGGIN, simulating and modelling.

Looking forward!

Task 1: The DAG

- Come up with an incredibly interesting and scientifically important made-up example for a phenomenon to investigate. Decide on two variables (an outcome and a predictor) that you would like to investigate the relation between. If in doubt, you can be inspired by Peter’s amazing example on the next page.

In our example, we wish to investigate the effect of the continuous predictor corona fear (CF) on the outcome variable energy level (also continuous).

Predictor: corona fear (continuous variable). This variable depends on OCD, as we expect OCD to affect your corona fear in such way that a higher level of OCD increases your level of corona fear.

Outcome variable: energy level (continuous variable), depends on germ level as we expect the level of germs to determine your energy level. With a low level of germs, the germs will not take up much energy and nutrients from your body, but with a higher number of germs in your system, the germs will take more nutrients and energy.

OCD: someone’s level of OCD describes as a continuous variable for the sake of this assignment. This variable is independent from all other variables in our DAG but acts as a common cause of corona fear and sanitizing.

Sanitizing: a continuous variable dependent on both corona fear and OCD. It is dependent on corona fear as we expect that you will sanitize more as your fear of corona increases. It is dependent on OCD as a typical symptom of OCD is an abnormal fear of germs, which we assume will result in highly frequent use of sanitizer. Thus, we assume that the amount of sanitizing will increase with the level of OCD.

Social exposure: a measure of how much social mingling you engage in (a combined measure of social circle size, social activities etc.). This variable is also dependent on corona fear, as we assume that a person who is really scared of Corona would isolate more and thus have a much lower social exposure, whereas a person who didn’t fear corona at all wouldn’t worry about meeting people and thus have a correspondingly larger social exposure.

Germ level: a continuous variable describing the level of vira germs in your body. For the sake of simplicity, we assume that the level of germs present in your body can be described as a normally distributed continuous variable. This variable is dependent on the social exposure variable as well as sanitizing. It depends on social exposure, as a lot of social exposure implies more exposure to germs. It depends on sanitizing as sanitizing works by killing most germs, preventing them from entering your body.

- Make a DAG for the phenomenon. Make it medium complicated: that means, make sure there are some different kinds of relations (see next step). Change it if you don’t get anything interesting for the next steps.
Draw it somehow (on paper, in R, laser engraved in diamond).
Code it in dagitty (this is a nice tool: http://dagitty.net/dags.html )

dag_corona <- dagitty( "dag {
    corona_fear -> sanitizing
    sanitizing-> germ_level
    OCD -> sanitizing
    OCD -> corona_fear
    social_exposure -> germ_level
    corona_fear -> social_exposure
    germ_level -> energylvl
}")

# Plotting the DAG
coordinates(dag_corona) <- list( 
  x=c(corona_fear =-1, OCD = -3, social_exposure =0, sanitizing =-2, germ_level=-1, energylvl= -1) , 
  y=c(corona_fear =-1, OCD = -1, social_exposure =0, sanitizing = 0, germ_level=1, energylvl=2) )

drawdag(dag_corona)

- Find elemental forms of variable relations in the DAG (i.e., forks, pipes, colliders, and their descendants).

Pipe

(Corona Fear –> Social Exposure –> Germ Level)

(Corona Fear -> Sanitizing -> Germ Level)

(OCD –> Corona Fear –> Social Exposure)

(OCD –> Corona Fear –> Sanitizing)

(OCD –> Sanitizing –> Germ Level)

(Sanitizing –> Germ Level –> EnergyLevel)

(Social Exposure -> Germ Level -> Energy Level)

Fork

(Sanitizing <- Corona Fear -> Social Exposure)

(Corona Fear <- OCD -> Sanitizing)

Collider

(Sanitizing -> Germ Level <- Social Exposure)

(Corona Fear-> Sanitizing <- OCD)

Descendant

(OCD –> Corona Fear –> Social Exposure, Corona Fear –> Sanitizing )

- Find out what variables to include (and not include) in a multiple linear regression to avoid ‘back door’ (AKA non-causal) paths. Do this first with your eyes and your mind. Then you can use dagitty’s function adjustmentSets().

# Shutting the backdoor - analyzing the graph to block the backdoor
adjustmentSets(dag_corona,exposure="corona_fear", outcome="energylvl")
## { OCD }

OCD is a backdoor that we should control for (i.e., include in a our model) in order to find the full causal effect of corona fear on energy level.

- Find out which conditional independencies the DAG implies. First with the mind, then with daggity’s function impliedConditionalIndependencies().

# Deriving our DAG’s conditional independencies
impliedConditionalIndependencies(dag_corona)
## OCD _||_ enrg | grm_
## OCD _||_ enrg | sntz, scl_
## OCD _||_ enrg | crn_, sntz
## OCD _||_ grm_ | sntz, scl_
## OCD _||_ grm_ | crn_, sntz
## OCD _||_ scl_ | crn_
## crn_ _||_ enrg | grm_
## crn_ _||_ enrg | sntz, scl_
## crn_ _||_ grm_ | sntz, scl_
## enrg _||_ sntz | grm_
## enrg _||_ scl_ | grm_
## sntz _||_ scl_ | crn_

The above output shows the testable implications of our DAG, which we will go through in task 3 to test if any of them can be falsified - i.e. to know if the DAG is compatible with the data. If any of the conditional independencies are not fulfilled, we can falsify our DAG.

- Find the full list of Markov equivalent DAGS. Use daggity’s function equivalentGraphs().

The equivalentGraphs() function outputs a set of DAGs with the same conditional independencies is known as a Markov equivalence set.

equivalentDAGs(dag_corona)
## [[1]]
## dag {
## OCD [pos="-3.000,-1.000"]
## corona_fear [pos="-1.000,-1.000"]
## energylvl [pos="-1.000,2.000"]
## germ_level [pos="-1.000,1.000"]
## sanitizing [pos="-2.000,0.000"]
## social_exposure [pos="0.000,0.000"]
## OCD -> corona_fear
## OCD -> sanitizing
## corona_fear -> sanitizing
## corona_fear -> social_exposure
## germ_level -> energylvl
## sanitizing -> germ_level
## social_exposure -> germ_level
## }
## 
## [[2]]
## dag {
## OCD [pos="-3.000,-1.000"]
## corona_fear [pos="-1.000,-1.000"]
## energylvl [pos="-1.000,2.000"]
## germ_level [pos="-1.000,1.000"]
## sanitizing [pos="-2.000,0.000"]
## social_exposure [pos="0.000,0.000"]
## OCD -> corona_fear
## OCD -> sanitizing
## corona_fear -> social_exposure
## germ_level -> energylvl
## sanitizing -> corona_fear
## sanitizing -> germ_level
## social_exposure -> germ_level
## }
## 
## [[3]]
## dag {
## OCD [pos="-3.000,-1.000"]
## corona_fear [pos="-1.000,-1.000"]
## energylvl [pos="-1.000,2.000"]
## germ_level [pos="-1.000,1.000"]
## sanitizing [pos="-2.000,0.000"]
## social_exposure [pos="0.000,0.000"]
## OCD -> corona_fear
## corona_fear -> social_exposure
## germ_level -> energylvl
## sanitizing -> OCD
## sanitizing -> corona_fear
## sanitizing -> germ_level
## social_exposure -> germ_level
## }
## 
## [[4]]
## dag {
## OCD [pos="-3.000,-1.000"]
## corona_fear [pos="-1.000,-1.000"]
## energylvl [pos="-1.000,2.000"]
## germ_level [pos="-1.000,1.000"]
## sanitizing [pos="-2.000,0.000"]
## social_exposure [pos="0.000,0.000"]
## OCD -> sanitizing
## corona_fear -> OCD
## corona_fear -> sanitizing
## corona_fear -> social_exposure
## germ_level -> energylvl
## sanitizing -> germ_level
## social_exposure -> germ_level
## }
## 
## [[5]]
## dag {
## OCD [pos="-3.000,-1.000"]
## corona_fear [pos="-1.000,-1.000"]
## energylvl [pos="-1.000,2.000"]
## germ_level [pos="-1.000,1.000"]
## sanitizing [pos="-2.000,0.000"]
## social_exposure [pos="0.000,0.000"]
## OCD -> sanitizing
## corona_fear -> OCD
## corona_fear -> sanitizing
## germ_level -> energylvl
## sanitizing -> germ_level
## social_exposure -> corona_fear
## social_exposure -> germ_level
## }
## 
## [[6]]
## dag {
## OCD [pos="-3.000,-1.000"]
## corona_fear [pos="-1.000,-1.000"]
## energylvl [pos="-1.000,2.000"]
## germ_level [pos="-1.000,1.000"]
## sanitizing [pos="-2.000,0.000"]
## social_exposure [pos="0.000,0.000"]
## corona_fear -> OCD
## corona_fear -> sanitizing
## corona_fear -> social_exposure
## germ_level -> energylvl
## sanitizing -> OCD
## sanitizing -> germ_level
## social_exposure -> germ_level
## }
## 
## [[7]]
## dag {
## OCD [pos="-3.000,-1.000"]
## corona_fear [pos="-1.000,-1.000"]
## energylvl [pos="-1.000,2.000"]
## germ_level [pos="-1.000,1.000"]
## sanitizing [pos="-2.000,0.000"]
## social_exposure [pos="0.000,0.000"]
## corona_fear -> OCD
## corona_fear -> sanitizing
## germ_level -> energylvl
## sanitizing -> OCD
## sanitizing -> germ_level
## social_exposure -> corona_fear
## social_exposure -> germ_level
## }
## 
## [[8]]
## dag {
## OCD [pos="-3.000,-1.000"]
## corona_fear [pos="-1.000,-1.000"]
## energylvl [pos="-1.000,2.000"]
## germ_level [pos="-1.000,1.000"]
## sanitizing [pos="-2.000,0.000"]
## social_exposure [pos="0.000,0.000"]
## corona_fear -> OCD
## corona_fear -> social_exposure
## germ_level -> energylvl
## sanitizing -> OCD
## sanitizing -> corona_fear
## sanitizing -> germ_level
## social_exposure -> germ_level
## }

The list outputted above shows us 8 DAGS that would have the exact same conditional independencies as our DAG, dag_corona, has. However, many of the causal relations between variables in the DAGs listed above would be incompatible with our scientific knowledge and we would have no reason to build statistical models based on these DAGs. For instance, we would have no reason to asumme that sanitizing will have a causal effect on OCD, as suggested by e.g. DAG 3, 6, 7, and 8 above.

Task 2: The data

- Simulate some data that fits the DAG. There are many ways to do this. A simple way is just to sample one variable from a normal distribution which has another variable as mean. McElreath does this in the book a few times, and you can use this as inspiration.

set.seed(3)
N = 1000

Simulating variables

# OCD
OCD <- rnorm(N, mean=5, sd =1)
hist(OCD)

# Corona fear
corona_fear <- rnorm(N, mean=20+(10*OCD), sd=4) 
hist(corona_fear)

# Social exposure 
social_exposure <- rnorm(N, mean= (35-(corona_fear/3)), sd= 1) #
plot(corona_fear,social_exposure, main="Plot: corona fear and social exposure")

# Sanitizing 
sanitizing <-  rnorm(N, mean= (corona_fear-8)+(OCD*2), sd=2)
hist(sanitizing)

# Germ level 
germ_level <-  rnorm(N, mean= 8000 - ((sanitizing*60)+(100*social_exposure)), sd=100)
hist(germ_level)

# Energy level
energylvl <-rnorm(N, mean=80-((germ_level/100)), sd=6) 
hist(energylvl)

df <- tibble(germ_level, sanitizing, energylvl, social_exposure, corona_fear, OCD) 

# creating df
#df$energylvl <- ifelse(df$energylvl=="0",1,2)

ls.str(df)
## corona_fear :  num [1:1000] 62.1 62.5 68.7 49.6 74.3 ...
## energylvl :  num [1:1000] 53.4 66.5 43.9 40 49.6 ...
## germ_level :  num [1:1000] 2847 2857 2372 3301 2514 ...
## OCD :  num [1:1000] 4.04 4.71 5.26 3.85 5.2 ...
## sanitizing :  num [1:1000] 65.1 61.4 69.6 50.6 77.2 ...
## social_exposure :  num [1:1000] 13.3 14.4 12.7 17.6 10.5 ...

Standardizing variables Standardizing variables:

df <- df %>% 
  mutate(germ_std=scale(germ_level),
         sanitizing_std= scale(sanitizing),
         energylvl_std=scale(energylvl),
         social_exposure_std=scale(social_exposure),
         corona_fear_std= scale(corona_fear),
         OCD_std = scale(OCD)
         )
#ls.str(df)

Task 3: Statistics

- Run multiple linear regressions to test the conditional independencies implied by your DAG. Make sure to avoid backdoor paths. See that the linear model shows the conditional independencies implied by your DAG, implying that the data and the DAG are compatible (if the linear model doesn’t show the conditional independencies implied by the DAG, the data and the DAG doesn’t fit).

OCD || enrg | grm_ OCD || enrg | sntz, scl_ OCD || enrg | crn_, sntz OCD || grm_ | sntz, scl_ OCD || grm_ | crn_, sntz OCD || scl_ | crn_

crn_ || enrg | grm_ crn_ || enrg | sntz, scl_ crn_ || grm_ | sntz, scl_ enrg || sntz | grm_ enrg || scl_ | grm_ sntz || scl_ | crn_

Testing the conditional independencies

Model 1: OCD || enrg | germ Testing whether the causal coefficient of energy_level becomes significantly close to zero when stratifying by germ level in a model that has OCD as outcome variable.

adjustmentSets(dag_corona,exposure="OCD", outcome="energylvl") 
##  {}
m1 <- quap(
    alist(
        OCD_std ~ dnorm( mu , sigma ) ,
        mu <- a + bEng*energylvl_std+bGermlvl*germ_std,
        a ~ dnorm( 0 , 0.2 ) ,
        bEng ~ dnorm( 0 , 1) ,
        bGermlvl ~ dnorm( 0 , 1) ,
        sigma ~ dexp( 1 ) 
    ) , data = df )

plot(coeftab(m1),by.model=TRUE)

As seen in the above plot, energy level becomes independent from OCF (i.e., the model does with almost no uncertainty estimate the coefficient bEng to be very close to zero) when stratifying by germ level.

Model 2: OCD || enrg | sntz, scl_

Testing whether the causal coefficient of energy level becomes significantly close to zero when stratifying by sanitizing and social exposure in a model that has OCD as outcome variable.

adjustmentSets(dag_corona,exposure="energylvl", outcome="OCD") # choosing { sanitizing, social_exposure }
## { corona_fear, sanitizing }
## { sanitizing, social_exposure }
## { germ_level }
m2 <- quap(
    alist(
        OCD_std ~ dnorm( mu , sigma ) ,
        mu <- a + bEng*energylvl_std + bSan*sanitizing_std+bSoc*social_exposure_std ,
        a ~ dnorm( 0 , 0.2 ) ,
        bEng ~ dnorm( 0 , 1) ,
        bSan ~ dnorm( 0 , 1 ) ,
        bSoc ~ dnorm( 0 , 1 ) ,
        sigma ~ dexp( 1 ) # OBS : why exp? It doesn't change anything changing it to dnorm or dunif
    ) , data = df )

plot(coeftab(m2),by.model=TRUE)

precis(m2)

As seen in the above plot, energy level becomes independent from OCF (i.e., the model does with almost no uncertainty estimate the coefficient bEng to be very close to zero) when stratifying by energy level, social exposure and

Model 3: OCD || enrg | crn_, sntz Testing whether the causal coefficient of energy level becomes significantly close to zero when stratifying by corona fear and sanitizing in a model that has OCD as outcome variable.

adjustmentSets(dag_corona,exposure="energylvl", outcome="OCD") # { corona_fear, sanitizing }
## { corona_fear, sanitizing }
## { sanitizing, social_exposure }
## { germ_level }
m3<- quap(
    alist(
        OCD_std ~ dnorm( mu , sigma ) ,
        mu <- a  + bEng*energylvl_std + bCoronaFear*corona_fear_std + bSanitizing*sanitizing_std,
        a ~ dnorm( 0 , 0.2 ) ,
        bEng~ dnorm( 0 , 1 ) ,
        bCoronaFear~ dnorm( 0 , 1 ),
        bSanitizing ~ dnorm( 0 , 1 ),
        sigma ~ dexp( 1 )
    ) , data = df )


plot(coeftab(m3),by.model=TRUE)

As seen in the above plot, bEng becomes independent from OCD (i.e., the model does with almost no uncertainty estimate the coefficient bEng to be very close to zero) when stratifying by sanitizing and corona fear.

Model 4: OCD || grm_ | sntz, scl_ Testing whether the causal coefficient of germ level becomes significantly close to zero when stratifying by sanitizing and social exposure in a model that has OCD as outcome variable.

m4 <- quap(
    alist(
        OCD_std ~ dnorm( mu , sigma ) ,
        mu <- a  + bGermLevel*germ_std + bSanitizing*sanitizing_std + bSoc*social_exposure_std,
        a ~ dnorm( 0 , 0.2 ) ,
        bGermLevel ~ dnorm( 0 , 1 ),
        bSanitizing ~ dnorm( 0 , 1 ),
        bSoc~ dnorm( 0 , 1 ) ,
        sigma ~ dexp( 1 )
    ) , data = df )

plot(coeftab(m4),by.model=TRUE)

As seen in the above plot, benergylvl becomes independent from OCD when stratifying by germ level. That is, given the data and our model, posterior mean seems to be pretty close to zero.

Model 5: OCD || grm_ | crn_, sntz Testing whether the causal coefficient of germ level becomes significantly close to zero when stratifying by corona fear and sanitizing in a model that has OCD as outcome variable.

adjustmentSets(dag_corona,exposure="germ_level", outcome="OCD") # OCD _||_ grm_ | crn_, sntz
## { corona_fear, sanitizing }
## { sanitizing, social_exposure }
m5 <-  quap(
    alist(
        OCD_std ~ dnorm( mu , sigma ) ,
        mu <- a + bGerm*germ_std  + bCoronaFear*corona_fear_std + bSanitizing*sanitizing_std,
        a ~ dnorm( 0 , 0.2 ) ,
        bGerm~ dnorm( 0 , 1 ) ,
        bCoronaFear~ dnorm( 0 , 1 ),
        bSanitizing~ dnorm( 0 , 1 ),
        sigma ~ dexp( 1 )
    ) , data = df )

plot(coeftab(m5),by.model=TRUE)

As seen in the above plot, germ level becomes independent from OCD (i.e., the model does with almost no uncertainty estimate the coefficient bGerm (germ level) to be very close to zero) when stratifying by face corona fear and sanitizing.

Model 6:OCD || scl_ | crn_ Testing whether the causal coefficient of social exposure becomes significantly close to zero when stratifying by corona fear in a model that has OCD as outcome variable.

adjustmentSets(dag_corona,exposure="social_exposure", outcome="OCD") #OCD _||_ scl_ | crn_
## { corona_fear }
m6<- quap(
    alist(
        OCD_std ~ dnorm( mu , sigma ) ,
        mu <- a + bCorona_fear*corona_fear_std + bSoExp* social_exposure_std,
        a ~ dnorm( 0 , 0.2 ) ,
        bCorona_fear~ dnorm( 0 , 1 ),
        bSoExp~ dnorm( 0 , 1 ),
        sigma ~ dexp( 1 )
    ) , data = df )

plot(coeftab(m6),by.model=TRUE)

As seen in the above plot, social exposure becomes independent from OCD (i.e., the model does with almost no uncertainty estimate the coefficient bSoExp (social exposure) to be very close to zero) when stratifying by corona fear.

Model 7: crn_ || enrg | grm_ Testing whether the causal coefficient energy level becomes significantly close to zero when stratifying by germ level in a model that has corona fear as outcome variable.

adjustmentSets(dag_corona,exposure="energylvl", outcome="corona_fear") #crn_ _||_ enrg | grm_
## { sanitizing, social_exposure }
## { germ_level }
m7<- quap(
    alist(
        corona_fear_std ~ dnorm( mu , sigma ) ,
        mu <- a  + bGermLevel*germ_std + bEnergyLvl*energylvl_std,
        a ~ dnorm( 0 , 0.2 ) ,
        bGermLevel~ dnorm( 0 , 1 ),
        bEnergyLvl~ dnorm( 0 , 1 ) ,
        sigma ~ dexp( 1 )
    ) , data = df )

plot(coeftab(m7),by.model=TRUE )

As seen in the above plot, corona fear and energy level becomes independent (i.e., the model with almost no uncertainty estimates the coefficient bEnergyLevel to be very close to zero ) when stratifying by germ level.

Model 8: crn_ || enrg | sntz, scl_ Testing whether the causal coefficient energy level becomes significantly close to zero when stratifying by sanitizing and social exposure in a model that has corona fear as outcome variable.

adjustmentSets(dag_corona,exposure="energylvl", outcome="corona_fear") #crn_ _||_ enrg | grm_
## { sanitizing, social_exposure }
## { germ_level }
m8<- quap(
    alist(
        corona_fear_std ~ dnorm( mu , sigma ) ,
        mu <- a  + bEnergyLvl*energylvl_std + bSanitizing* sanitizing_std + bSoExp*social_exposure_std,
        a ~ dnorm( 0 , 0.2 ) ,
        bEnergyLvl~ dnorm( 0 , 1 ) ,
        bSanitizing~ dnorm( 0 , 1 ),
        bSoExp~ dnorm( 0 , 1 ),
        sigma ~ dexp( 1 )
    ) , data = df )

plot(coeftab(m8),by.model=TRUE )

As seen in the above plot, corona fear and energy level becomes independent (i.e., the model confidently estimates the coefficient bEnergyLvl (energy level) to be very close to zero) when stratifying by social exposure and sanitizing.

Model 9: crn_ || grm_ | sntz, scl_ Testing whether the causal coefficient for germ level becomes significantly close to zero when stratifying by sanitizing and social exposure in a model that has corona fear as outcome variable.

m9 <- quap(
    alist(
        corona_fear_std ~ dnorm( mu , sigma ) ,
        mu <- a + bGermlvl * germ_std  + bSanitizing * sanitizing_std + bSocial_exp * social_exposure_std + bOCD*OCD_std,
        a ~ dnorm( 0 , 0.2 ) ,
        bGermlvl ~ dnorm( 0 , 1 ),
        bSanitizing~ dnorm( 0 , 1 ),
        bSocial_exp~ dnorm( 0 , 1 ),
        bOCD ~ dnorm( 0, 1 ),
        sigma ~ dexp( 1 )
    ) , data = df )

plot(coeftab(m9),by.model=TRUE )

As seen in the above plot, bGermlvl becomes independent from corona fear (i.e., the model does with almost no uncertainty estimate the coefficient bGermlvl to be very close to zero) when stratifying by sanitizing and social exposure.

Model 10: enrg || sntz | grm_ Testing whether the causal coefficient for sanitizing becomes significantly close to zero when stratifying by germ level in a model that has energy level as outcome variable.

m10 <- quap(
    alist(
        energylvl_std ~ dnorm( mu , sigma ) ,
        mu <- a + bSanitizing*sanitizing_std + bGermLevel*germ_std,
        a ~ dnorm( 0 , 0.2 ) ,
        bSanitizing ~ dnorm( 0 , 1 ) ,
        bGermLevel~ dnorm( 0 , 1 ),
        sigma ~ dexp( 1 )
    ) , data = df )

plot(coeftab(m10),by.model=TRUE)

As seen in the above plot, bSanitizing becomes independent from energy level (i.e., the model does with almost no uncertainty estimate the coefficient bSanitizing to be very close to zero) when stratifying by germ level.

Model 11: enrg || scl_ | grm_ Testing whether the causal coefficient for social exposure becomes significantly close to zero when stratifying by germ level in a model that has energy level as outcome variable.

m11 <- quap(
    alist(
        energylvl_std ~ dnorm( mu , sigma ) ,
        mu <- a + bSocial_exp*social_exposure_std  + bGermlvl*germ_std,
        a ~ dnorm( 0 , 0.2 ) ,
        bSocial_exp ~ dnorm( 0 , 1 ) ,
        bGermlvl~ dnorm( 0 , 1 ),
        sigma ~ dexp( 1 )
    ) , data = df )

plot(coeftab(m11),by.model=TRUE)

As seen in the above plot, bSocial_exp becomes independent from energy level (i.e., the model does with almost no uncertainty estimate the coefficient bSocial_exp to be very close to zero) when stratifying by germ level.

Model 12: sntz || scl_ | crn_ Testing whether the causal coefficient for social exposure becomes significantly close to zero when stratifying by corona fear in a model that has sanitizing as outcome variable.

m12 <- quap(
    alist(
        sanitizing_std ~ dnorm( mu , sigma ) ,
        mu <- a + bSocial_exp*social_exposure_std  + bCorona_fear*corona_fear_std,
        a ~ dnorm(0,0.2) ,
        bSocial_exp ~ dnorm(0,1) ,
        bCorona_fear~ dnorm(0,1),
        sigma ~ dexp(1)
    ) , data = df )

plot(coeftab(m12),by.model=TRUE)

As seen in the above plot, bSocial_exp becomes independent from sanitizing (i.e., the model does with almost no uncertainty estimate the coefficient bSocial_exp to be very close to zero) when stratifying by corona fear.

Task 4: Messing it up

- Try and deliberately have an open back door path and see if you can get wrong inference.

We know that we will get wrong inference if we donøt include OCD in our model. Not including OCD in the model results in a different estimate of the level of corona fear’s influence on the chance of whether you get energylvl or not, that is zero influence of corona fear on influence. Contrarily, including OCD in the model and thus closing the backdoor

# Model with closed back door 
m_nobackdoor <- quap(
    alist(
        energylvl_std ~ dnorm( mu , sigma ) ,
        mu <- a + bCorona_fear*corona_fear_std + bOCD*OCD_std ,
        a ~ dnorm(0,0.2) ,
        bCorona_fear ~ dnorm(0,1) ,
        bOCD~ dnorm(0,1),
        sigma ~ dexp(1)
    ) , data = df )

plot(coeftab(m6),by.model=TRUE) # bCorona_fear coef around 0.3

# Model with open back door 
m_backdoor <- quap(
    alist(
        energylvl_std ~ dnorm( mu , sigma ) ,
        mu <- a + bCorona_fear*corona_fear_std,
        a ~ dnorm(0,0.2) ,
        bCorona_fear ~ dnorm(0,1) ,
        sigma ~ dexp(1)
    ) , data = df )

plot(coeftab(m_backdoor),by.model=TRUE) # bCorona_fear coef around 0.5

When comparing the plots of the two models above, it becomes clear that having an open back door path makes the causal effect that corona fear has on energy level appear stronger than it actually is. Closing the backdoor path results in an estimated posterior mean of the corona fear coefficient around 0.3. Having an open backdoor path results in an estimated posterior mean of apx. 0.5, thus making it look as if the effect is bigger than it actually is. This is due to OCD being a confound that affects energy level through a front door path via corona fear, but also as a back door path via sanitizing.

- Try and deliberately simulate some data that doesn’t fit the DAG, or create a new DAG that doesn’t fit the data.

- Use the same approach as above to show that the DAG is wrong (by showing that conditional independencies don’t exist in the data, for example).

# Simulating variables:
N <- 10000 

# OCD
OCD_w <- rnorm(N, mean=5, sd =0.001)
  
# Corona fear (assuming that it is normally distributed)
corona_fear_w <-rnorm(N, mean=50, sd=10)

# Social exposure (is dependent on corona_fear)
social_exposure_w <- rnorm(N, mean= (35+(energylvl*3)), sd= 1) #
plot(corona_fear_w,social_exposure_w)

# Sanitizing (is dependent on corona_fear and OCD)
sanitizing_w <-  rnorm(N, mean= (social_exposure_w/15), sd=2)
hist(sanitizing_w)

# Probability of being energylvl (depends on your germ level)
energylvl_w <-rnorm(N, mean=50-((social_exposure_w/100)), sd=1) 
hist(energylvl_w)

# Germ level (is dependent on both sanitizing & whether you wear face mask or not)
germ_level_w <-  rnorm(N, mean= 4600 - ((energylvl_w*15)), sd=100)
hist(germ_level_w)

# creating df
df_w <- tibble(germ_level_w, sanitizing_w, energylvl_w, social_exposure_w, corona_fear_w, OCD_w) 

Standardizing variables Standardizing variables:

df_w <- df_w %>% 
  mutate(germ_std_w=scale(germ_level_w),
         sanitizing_std_w= scale(sanitizing_w),
         energylvl_std_w=scale(energylvl_w),
         social_exposure_std_w=scale(social_exposure_w),
         corona_fear_std_w= scale(corona_fear_w),
         OCD_std_w = scale(OCD_w)
         )

Wrong model 1: OCD || enrg | germ Testing whether the causal coefficient of energy_level becomes significantly close to zero when stratifying by germ level in a model that has OCD as outcome variable.

adjustmentSets(dag_corona,exposure="OCD", outcome="energylvl") 
##  {}
m1_w <- quap(
    alist(
        OCD_std_w ~ dnorm( mu , sigma ) ,
        mu <- a + bEng*energylvl_std_w+bGermlvl*germ_std_w,
        a ~ dnorm( 0 , 0.2 ) ,
        bEng ~ dnorm( 0 , 1) ,
        bGermlvl ~ dnorm( 0 , 1) ,
        sigma ~ dexp( 1 ) 
    ) , data = df_w )

plot(coeftab(m1_w),by.model=TRUE)

As seen in the above plot, energy level becomes independent from OCF (i.e., the model does with almost no uncertainty estimate the coefficient bEng to be very close to zero) when stratifying by germ level.

Wrong model 2: OCD || enrg | sntz, scl_

Testing whether the causal coefficient of energy level becomes significantly close to zero when stratifying by sanitizing and social exposure in a model that has OCD as outcome variable.

adjustmentSets(dag_corona,exposure="energylvl", outcome="OCD") # choosing { sanitizing, social_exposure }
## { corona_fear, sanitizing }
## { sanitizing, social_exposure }
## { germ_level }
m2_w <- quap(
    alist(
        OCD_std_w ~ dnorm( mu , sigma ) ,
        mu <- a + bEng*energylvl_std_w + bSan*sanitizing_std_w+bSoc*social_exposure_std_w ,
        a ~ dnorm( 0 , 0.2 ) ,
        bEng ~ dnorm( 0 , 1) ,
        bSan ~ dnorm( 0 , 1 ) ,
        bSoc ~ dnorm( 0 , 1 ) ,
        sigma ~ dexp( 1 ) # OBS : why exp? It doesn't change anything changing it to dnorm or dunif
    ) , data = df_w )

plot(coeftab(m2_w),by.model=TRUE)

precis(m2_w)

As seen in the above plot, energy level becomes independent from OCF (i.e., the model does with almost no uncertainty estimate the coefficient bEng to be very close to zero) when stratifying by energy level, social exposure and

Wrong model 3: OCD || enrg | crn_, sntz Testing whether the causal coefficient of energy level becomes significantly close to zero when stratifying by corona fear and sanitizing in a model that has OCD as outcome variable.

adjustmentSets(dag_corona,exposure="energylvl", outcome="OCD") # { corona_fear, sanitizing }
## { corona_fear, sanitizing }
## { sanitizing, social_exposure }
## { germ_level }
m3_w<- quap(
    alist(
        OCD_std_w ~ dnorm( mu , sigma ) ,
        mu <- a  + bEng*energylvl_std_w + bCoronaFear*corona_fear_std_w + bSanitizing*sanitizing_std_w,
        a ~ dnorm( 0 , 0.2 ) ,
        bEng~ dnorm( 0 , 1 ) ,
        bCoronaFear~ dnorm( 0 , 1 ),
        bSanitizing ~ dnorm( 0 , 1 ),
        sigma ~ dexp( 1 )
    ) , data = df_w )


plot(coeftab(m3_w),by.model=TRUE)

As seen in the above plot, bEng becomes independent from OCD (i.e., the model does with almost no uncertainty estimate the coefficient bEng to be very close to zero) when stratifying by sanitizing and corona fear.

Wrong model 4: OCD || grm_ | sntz, scl_ Testing whether the causal coefficient of germ level becomes significantly close to zero when stratifying by sanitizing and social exposure in a model that has OCD as outcome variable.

m4_w <- quap(
    alist(
        OCD_std_w ~ dnorm( mu , sigma ) ,
        mu <- a  + bGermLevel*germ_std_w + bSanitizing*sanitizing_std_w + bSoc*social_exposure_std_w,
        a ~ dnorm( 0 , 0.2 ) ,
        bGermLevel ~ dnorm( 0 , 1 ),
        bSanitizing ~ dnorm( 0 , 1 ),
        bSoc~ dnorm( 0 , 1 ) ,
        sigma ~ dexp( 1 )
    ) , data = df_w )

plot(coeftab(m4_w),by.model=TRUE)

As seen in the above plot, benergylvl becomes independent from OCD when stratifying by germ level. That is, given the data and our model, posterior mean seems to be pretty close to zero.

Wrong Model 5: OCD || grm_ | crn_, sntz Testing whether the causal coefficient of germ level becomes significantly close to zero when stratifying by corona fear and sanitizing in a model that has OCD as outcome variable.

adjustmentSets(dag_corona,exposure="germ_level", outcome="OCD") # OCD _||_ grm_ | crn_, sntz
## { corona_fear, sanitizing }
## { sanitizing, social_exposure }
m5_w <-  quap(
    alist(
        OCD_std_w ~ dnorm( mu , sigma ) ,
        mu <- a + bGerm*germ_std_w  + bCoronaFear*corona_fear_std_w + bSanitizing*sanitizing_std_w,
        a ~ dnorm( 0 , 0.2 ) ,
        bGerm~ dnorm( 0 , 1 ) ,
        bCoronaFear~ dnorm( 0 , 1 ),
        bSanitizing~ dnorm( 0 , 1 ),
        sigma ~ dexp( 1 )
    ) , data = df_w )

plot(coeftab(m5_w),by.model=TRUE)

As seen in the above plot, germ level becomes independent from OCD (i.e., the model does with almost no uncertainty estimate the coefficient bGerm (germ level) to be very close to zero) when stratifying by face corona fear and sanitizing.

Wrong Model 6:OCD || scl_ | crn_ Testing whether the causal coefficient of social exposure becomes significantly close to zero when stratifying by corona fear in a model that has OCD as outcome variable.

adjustmentSets(dag_corona,exposure="social_exposure", outcome="OCD") #OCD _||_ scl_ | crn_
## { corona_fear }
m6_w<- quap(
    alist(
        OCD_std_w ~ dnorm( mu , sigma ) ,
        mu <- a + bCorona_fear*corona_fear_std_w + bSoExp* social_exposure_std_w,
        a ~ dnorm( 0 , 0.2 ) ,
        bCorona_fear~ dnorm( 0 , 1 ),
        bSoExp~ dnorm( 0 , 1 ),
        sigma ~ dexp( 1 )
    ) , data = df_w )

plot(coeftab(m6_w),by.model=TRUE)

As seen in the above plot, social exposure becomes independent from OCD (i.e., the model does with almost no uncertainty estimate the coefficient bSoExp (social exposure) to be very close to zero) when stratifying by corona fear.

Wrong Model 7: crn_ || enrg | grm_ Testing whether the causal coefficient energy level becomes significantly close to zero when stratifying by germ level in a model that has corona fear as outcome variable.

adjustmentSets(dag_corona,exposure="energylvl", outcome="corona_fear") #crn_ _||_ enrg | grm_
## { sanitizing, social_exposure }
## { germ_level }
m7_w<- quap(
    alist(
        corona_fear_std_w ~ dnorm( mu , sigma ) ,
        mu <- a  + bGermLevel*germ_std_w + bEnergyLvl*energylvl_std_w,
        a ~ dnorm( 0 , 0.2 ) ,
        bGermLevel~ dnorm( 0 , 1 ),
        bEnergyLvl~ dnorm( 0 , 1 ) ,
        sigma ~ dexp( 1 )
    ) , data = df_w )

plot(coeftab(m7_w),by.model=TRUE )

As seen in the above plot, corona fear and energy level becomes independent (i.e., the model with almost no uncertainty estimates the coefficient bEnergyLevel to be very close to zero ) when stratifying by germ level.

WrongModel 8: crn_ || enrg | sntz, scl_ Testing whether the causal coefficient energy level becomes significantly close to zero when stratifying by sanitizing and social exposure in a model that has corona fear as outcome variable.

adjustmentSets(dag_corona,exposure="energylvl", outcome="corona_fear") #crn_ _||_ enrg | grm_
## { sanitizing, social_exposure }
## { germ_level }
m8_w<- quap(
    alist(
        corona_fear_std_w ~ dnorm( mu , sigma ) ,
        mu <- a  + bEnergyLvl*energylvl_std_w + bSanitizing* sanitizing_std_w + bSoExp*social_exposure_std_w,
        a ~ dnorm( 0 , 0.2 ) ,
        bEnergyLvl~ dnorm( 0 , 1 ) ,
        bSanitizing~ dnorm( 0 , 1 ),
        bSoExp~ dnorm( 0 , 1 ),
        sigma ~ dexp( 1 )
    ) , data = df_w )

plot(coeftab(m8_w),by.model=TRUE )

As seen in the above plot, corona fear and energy level becomes independent (i.e., the model confidently estimates the coefficient bEnergyLvl (energy level) to be very close to zero) when stratifying by social exposure and sanitizing.

Wrong model 9: crn_ || grm_ | sntz, scl_ Testing whether the causal coefficient for germ level becomes significantly close to zero when stratifying by sanitizing and social exposure in a model that has corona fear as outcome variable.

m9_w <- quap(
    alist(
        corona_fear_std_w ~ dnorm( mu , sigma ) ,
        mu <- a + bGermlvl * germ_std_w  + bSanitizing * sanitizing_std_w + bSocial_exp * social_exposure_std_w + bOCD*OCD_std_w,
        a ~ dnorm( 0 , 0.2 ) ,
        bGermlvl ~ dnorm( 0 , 1 ),
        bSanitizing~ dnorm( 0 , 1 ),
        bSocial_exp~ dnorm( 0 , 1 ),
        bOCD ~ dnorm( 0, 1 ),
        sigma ~ dexp( 1 )
    ) , data = df_w )

plot(coeftab(m9_w),by.model=TRUE )

As seen in the above plot, bGermlvl becomes independent from corona fear (i.e., the model does with almost no uncertainty estimate the coefficient bGermlvl to be very close to zero) when stratifying by sanitizing and social exposure.

Wrong Model 10: enrg || sntz | grm_ # OBS: WRONG Testing whether the causal coefficient for sanitizing becomes significantly close to zero when stratifying by germ level in a model that has energy level as outcome variable.

m10_w <- quap(
    alist(
        energylvl_std_w ~ dnorm( mu , sigma ) ,
        mu <- a + bSanitizing*sanitizing_std_w + bGermLevel*germ_std_w,
        a ~ dnorm( 0 , 0.2 ) ,
        bSanitizing ~ dnorm( 0 , 1 ) ,
        bGermLevel~ dnorm( 0 , 1 ),
        sigma ~ dexp( 1 )
    ) , data = df_w )

plot(coeftab(m10_w),by.model=TRUE)

As seen in the above plot, bSanitizing does not become independent from energy level when stratifying by germ level. This violates one of the listed conditional independencies and tells us that our DAG is not compatible with “the wrong data”.

Wrong Model 11: enrg || scl_ | grm_ Testing whether the causal coefficient for social exposure becomes significantly close to zero when stratifying by germ level in a model that has energy level as outcome variable.

m11_w <- quap(
    alist(
        energylvl_std_w ~ dnorm( mu , sigma ) ,
        mu <- a + bSocial_exp*social_exposure_std_w  + bGermlvl*germ_std_w,
        a ~ dnorm( 0 , 0.2 ) ,
        bSocial_exp ~ dnorm( 0 , 1 ) ,
        bGermlvl~ dnorm( 0 , 1 ),
        sigma ~ dexp( 1 )
    ) , data = df_w )

plot(coeftab(m11_w),by.model=TRUE)

As seen in the above plot, bSocial_exp does not become independent from energy level when stratifying by germ level. This violates one of the listed conditional independencies and tells us that our DAG is not compatible with “the wrong data”.

Model 12: sntz || scl_ | crn_ Testing whether the causal coefficient for social exposure becomes significantly close to zero when stratifying by corona fear in a model that has sanitizing as outcome variable.

m12 <- quap(
    alist(
        sanitizing_std_w ~ dnorm( mu , sigma ) ,
        mu <- a + bSocial_exp*social_exposure_std_w  + bCorona_fear*corona_fear_std_w,
        a ~ dnorm(0,0.2) ,
        bSocial_exp ~ dnorm(0,1) ,
        bCorona_fear~ dnorm(0,1),
        sigma ~ dexp(1)
    ) , data = df_w )

plot(coeftab(m12),by.model=TRUE)

As seen in the above plot, bSocial_exp does not become independent from sanitizing when stratifying by germ level. This violates one of the listed conditional independencies and tells us that our DAG is not compatible with “the wrong data”.